Skip to content

Commit 20e6926

Browse files
Yinghai Lutorvalds
authored andcommitted
x86, ACPI, mm: Revert movablemem_map support
Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commit e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including: f7210e6 ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().") 01a178a ("acpi, memory-hotplug: support getting hotplug info from SRAT") 27168d3 ("acpi, memory-hotplug: extend movablemem_map ranges to the end of node") e8d1955 ("acpi, memory-hotplug: parse SRAT before memblock is ready") fb06bc8 ("page_alloc: bootmem limit with movablecore_map") 42f47e2 ("page_alloc: make movablemem_map have higher priority") 6981ec3 ("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes") 34b71f1 ("page_alloc: add movable_memmap kernel parameter") 4d59a75 ("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: Tim Gardner <tim.gardner@canonical.com> Reported-by: Don Morris <don.morris@hp.com> Bisected-by: Don Morris <don.morris@hp.com> Tested-by: Don Morris <don.morris@hp.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 14cc0b5 commit 20e6926

10 files changed

Lines changed: 27 additions & 544 deletions

File tree

Documentation/kernel-parameters.txt

Lines changed: 0 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1645,42 +1645,6 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
16451645
that the amount of memory usable for all allocations
16461646
is not too small.
16471647

1648-
movablemem_map=acpi
1649-
[KNL,X86,IA-64,PPC] This parameter is similar to
1650-
memmap except it specifies the memory map of
1651-
ZONE_MOVABLE.
1652-
This option inform the kernel to use Hot Pluggable bit
1653-
in flags from SRAT from ACPI BIOS to determine which
1654-
memory devices could be hotplugged. The corresponding
1655-
memory ranges will be set as ZONE_MOVABLE.
1656-
NOTE: Whatever node the kernel resides in will always
1657-
be un-hotpluggable.
1658-
1659-
movablemem_map=nn[KMG]@ss[KMG]
1660-
[KNL,X86,IA-64,PPC] This parameter is similar to
1661-
memmap except it specifies the memory map of
1662-
ZONE_MOVABLE.
1663-
If user specifies memory ranges, the info in SRAT will
1664-
be ingored. And it works like the following:
1665-
- If more ranges are all within one node, then from
1666-
lowest ss to the end of the node will be ZONE_MOVABLE.
1667-
- If a range is within a node, then from ss to the end
1668-
of the node will be ZONE_MOVABLE.
1669-
- If a range covers two or more nodes, then from ss to
1670-
the end of the 1st node will be ZONE_MOVABLE, and all
1671-
the rest nodes will only have ZONE_MOVABLE.
1672-
If memmap is specified at the same time, the
1673-
movablemem_map will be limited within the memmap
1674-
areas. If kernelcore or movablecore is also specified,
1675-
movablemem_map will have higher priority to be
1676-
satisfied. So the administrator should be careful that
1677-
the amount of movablemem_map areas are not too large.
1678-
Otherwise kernel won't have enough memory to start.
1679-
NOTE: We don't stop users specifying the node the
1680-
kernel resides in as hotpluggable so that this
1681-
option can be used as a workaround of firmware
1682-
bugs.
1683-
16841648
MTD_Partition= [MTD]
16851649
Format: <name>,<region-number>,<size>,<offset>
16861650

arch/x86/kernel/setup.c

Lines changed: 4 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1056,15 +1056,6 @@ void __init setup_arch(char **cmdline_p)
10561056
setup_bios_corruption_check();
10571057
#endif
10581058

1059-
/*
1060-
* In the memory hotplug case, the kernel needs info from SRAT to
1061-
* determine which memory is hotpluggable before allocating memory
1062-
* using memblock.
1063-
*/
1064-
acpi_boot_table_init();
1065-
early_acpi_boot_init();
1066-
early_parse_srat();
1067-
10681059
#ifdef CONFIG_X86_32
10691060
printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
10701061
(max_pfn_mapped<<PAGE_SHIFT) - 1);
@@ -1110,6 +1101,10 @@ void __init setup_arch(char **cmdline_p)
11101101
/*
11111102
* Parse the ACPI tables for possible boot-time SMP configuration.
11121103
*/
1104+
acpi_boot_table_init();
1105+
1106+
early_acpi_boot_init();
1107+
11131108
initmem_init();
11141109
memblock_find_dma_reserve();
11151110

arch/x86/mm/numa.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -212,9 +212,10 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
212212
* Allocate node data. Try node-local memory and then any node.
213213
* Never allocate in DMA zone.
214214
*/
215-
nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
215+
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
216216
if (!nd_pa) {
217-
pr_err("Cannot find %zu bytes in any node\n", nd_size);
217+
pr_err("Cannot find %zu bytes in node %d\n",
218+
nd_size, nid);
218219
return;
219220
}
220221
nd = __va(nd_pa);
@@ -559,12 +560,10 @@ static int __init numa_init(int (*init_func)(void))
559560
for (i = 0; i < MAX_LOCAL_APIC; i++)
560561
set_apicid_to_node(i, NUMA_NO_NODE);
561562

562-
/*
563-
* Do not clear numa_nodes_parsed or zero numa_meminfo here, because
564-
* SRAT was parsed earlier in early_parse_srat().
565-
*/
563+
nodes_clear(numa_nodes_parsed);
566564
nodes_clear(node_possible_map);
567565
nodes_clear(node_online_map);
566+
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
568567
WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
569568
numa_reset_distance();
570569

arch/x86/mm/srat.c

Lines changed: 3 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -141,126 +141,11 @@ static inline int save_add_info(void) {return 1;}
141141
static inline int save_add_info(void) {return 0;}
142142
#endif
143143

144-
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
145-
static void __init
146-
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
147-
{
148-
int overlap, i;
149-
unsigned long start_pfn, end_pfn;
150-
151-
start_pfn = PFN_DOWN(start);
152-
end_pfn = PFN_UP(end);
153-
154-
/*
155-
* For movablemem_map=acpi:
156-
*
157-
* SRAT: |_____| |_____| |_________| |_________| ......
158-
* node id: 0 1 1 2
159-
* hotpluggable: n y y n
160-
* movablemem_map: |_____| |_________|
161-
*
162-
* Using movablemem_map, we can prevent memblock from allocating memory
163-
* on ZONE_MOVABLE at boot time.
164-
*
165-
* Before parsing SRAT, memblock has already reserve some memory ranges
166-
* for other purposes, such as for kernel image. We cannot prevent
167-
* kernel from using these memory, so we need to exclude these memory
168-
* even if it is hotpluggable.
169-
* Furthermore, to ensure the kernel has enough memory to boot, we make
170-
* all the memory on the node which the kernel resides in
171-
* un-hotpluggable.
172-
*/
173-
if (hotpluggable && movablemem_map.acpi) {
174-
/* Exclude ranges reserved by memblock. */
175-
struct memblock_type *rgn = &memblock.reserved;
176-
177-
for (i = 0; i < rgn->cnt; i++) {
178-
if (end <= rgn->regions[i].base ||
179-
start >= rgn->regions[i].base +
180-
rgn->regions[i].size)
181-
continue;
182-
183-
/*
184-
* If the memory range overlaps the memory reserved by
185-
* memblock, then the kernel resides in this node.
186-
*/
187-
node_set(node, movablemem_map.numa_nodes_kernel);
188-
189-
goto out;
190-
}
191-
192-
/*
193-
* If the kernel resides in this node, then the whole node
194-
* should not be hotpluggable.
195-
*/
196-
if (node_isset(node, movablemem_map.numa_nodes_kernel))
197-
goto out;
198-
199-
insert_movablemem_map(start_pfn, end_pfn);
200-
201-
/*
202-
* numa_nodes_hotplug nodemask represents which nodes are put
203-
* into movablemem_map.map[].
204-
*/
205-
node_set(node, movablemem_map.numa_nodes_hotplug);
206-
goto out;
207-
}
208-
209-
/*
210-
* For movablemem_map=nn[KMG]@ss[KMG]:
211-
*
212-
* SRAT: |_____| |_____| |_________| |_________| ......
213-
* node id: 0 1 1 2
214-
* user specified: |__| |___|
215-
* movablemem_map: |___| |_________| |______| ......
216-
*
217-
* Using movablemem_map, we can prevent memblock from allocating memory
218-
* on ZONE_MOVABLE at boot time.
219-
*
220-
* NOTE: In this case, SRAT info will be ingored.
221-
*/
222-
overlap = movablemem_map_overlap(start_pfn, end_pfn);
223-
if (overlap >= 0) {
224-
/*
225-
* If part of this range is in movablemem_map, we need to
226-
* add the range after it to extend the range to the end
227-
* of the node, because from the min address specified to
228-
* the end of the node will be ZONE_MOVABLE.
229-
*/
230-
start_pfn = max(start_pfn,
231-
movablemem_map.map[overlap].start_pfn);
232-
insert_movablemem_map(start_pfn, end_pfn);
233-
234-
/*
235-
* Set the nodemask, so that if the address range on one node
236-
* is not continuse, we can add the subsequent ranges on the
237-
* same node into movablemem_map.
238-
*/
239-
node_set(node, movablemem_map.numa_nodes_hotplug);
240-
} else {
241-
if (node_isset(node, movablemem_map.numa_nodes_hotplug))
242-
/*
243-
* Insert the range if we already have movable ranges
244-
* on the same node.
245-
*/
246-
insert_movablemem_map(start_pfn, end_pfn);
247-
}
248-
out:
249-
return;
250-
}
251-
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
252-
static inline void
253-
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
254-
{
255-
}
256-
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
257-
258144
/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
259145
int __init
260146
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
261147
{
262148
u64 start, end;
263-
u32 hotpluggable;
264149
int node, pxm;
265150

266151
if (srat_disabled())
@@ -269,8 +154,7 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
269154
goto out_err_bad_srat;
270155
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
271156
goto out_err;
272-
hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
273-
if (hotpluggable && !save_add_info())
157+
if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
274158
goto out_err;
275159

276160
start = ma->base_address;
@@ -290,12 +174,9 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
290174

291175
node_set(node, numa_nodes_parsed);
292176

293-
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
177+
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
294178
node, pxm,
295-
(unsigned long long) start, (unsigned long long) end - 1,
296-
hotpluggable ? "Hot Pluggable": "");
297-
298-
handle_movablemem(node, start, end, hotpluggable);
179+
(unsigned long long) start, (unsigned long long) end - 1);
299180

300181
return 0;
301182
out_err_bad_srat:

drivers/acpi/numa.c

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -282,10 +282,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
282282
handler, max_entries);
283283
}
284284

285-
static int srat_mem_cnt;
286-
287-
void __init early_parse_srat(void)
285+
int __init acpi_numa_init(void)
288286
{
287+
int cnt = 0;
288+
289289
/*
290290
* Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
291291
* SRAT cpu entries could have different order with that in MADT.
@@ -295,24 +295,21 @@ void __init early_parse_srat(void)
295295
/* SRAT: Static Resource Affinity Table */
296296
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
297297
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
298-
acpi_parse_x2apic_affinity, 0);
298+
acpi_parse_x2apic_affinity, 0);
299299
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
300-
acpi_parse_processor_affinity, 0);
301-
srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
302-
acpi_parse_memory_affinity,
303-
NR_NODE_MEMBLKS);
300+
acpi_parse_processor_affinity, 0);
301+
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
302+
acpi_parse_memory_affinity,
303+
NR_NODE_MEMBLKS);
304304
}
305-
}
306305

307-
int __init acpi_numa_init(void)
308-
{
309306
/* SLIT: System Locality Information Table */
310307
acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
311308

312309
acpi_numa_arch_fixup();
313310

314-
if (srat_mem_cnt < 0)
315-
return srat_mem_cnt;
311+
if (cnt < 0)
312+
return cnt;
316313
else if (!parsed_numa_memblks)
317314
return -ENOENT;
318315
return 0;

include/linux/acpi.h

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -485,14 +485,6 @@ static inline bool acpi_driver_match_device(struct device *dev,
485485

486486
#endif /* !CONFIG_ACPI */
487487

488-
#ifdef CONFIG_ACPI_NUMA
489-
void __init early_parse_srat(void);
490-
#else
491-
static inline void early_parse_srat(void)
492-
{
493-
}
494-
#endif
495-
496488
#ifdef CONFIG_ACPI
497489
void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
498490
u32 pm1a_ctrl, u32 pm1b_ctrl));

include/linux/memblock.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ struct memblock {
4242

4343
extern struct memblock memblock;
4444
extern int memblock_debug;
45-
extern struct movablemem_map movablemem_map;
4645

4746
#define memblock_dbg(fmt, ...) \
4847
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
@@ -61,7 +60,6 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
6160
void memblock_trim_memory(phys_addr_t align);
6261

6362
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
64-
6563
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
6664
unsigned long *out_end_pfn, int *out_nid);
6765

include/linux/mm.h

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1333,24 +1333,6 @@ extern void free_bootmem_with_active_regions(int nid,
13331333
unsigned long max_low_pfn);
13341334
extern void sparse_memory_present_with_active_regions(int nid);
13351335

1336-
#define MOVABLEMEM_MAP_MAX MAX_NUMNODES
1337-
struct movablemem_entry {
1338-
unsigned long start_pfn; /* start pfn of memory segment */
1339-
unsigned long end_pfn; /* end pfn of memory segment (exclusive) */
1340-
};
1341-
1342-
struct movablemem_map {
1343-
bool acpi; /* true if using SRAT info */
1344-
int nr_map;
1345-
struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
1346-
nodemask_t numa_nodes_hotplug; /* on which nodes we specify memory */
1347-
nodemask_t numa_nodes_kernel; /* on which nodes kernel resides in */
1348-
};
1349-
1350-
extern void __init insert_movablemem_map(unsigned long start_pfn,
1351-
unsigned long end_pfn);
1352-
extern int __init movablemem_map_overlap(unsigned long start_pfn,
1353-
unsigned long end_pfn);
13541336
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
13551337

13561338
#if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \

0 commit comments

Comments
 (0)